Voice Quality Modelling for Expressive Speech Synthesis
نویسندگان
چکیده
This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F 0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics.
منابع مشابه
Investigating HMMs as a parametric model for expressive speech synthesis in German
The paper investigates the potential of HMM based synthesis to support the parameterisation of expressive speech in German. First, we review the assets of HMMs in the perspective of previous works in speech modelling and speech transformation. It is shown that HMMs define a flexible parametric model of the speech acoustics, which readily integrates several levels of speech modelling, such as di...
متن کاملExpressive text-to-speech approaches
The core concern of this paper is the modelling and the tractability of expressiveness in natural voice synthesis. In the first part we quickly discuss the imponderable gap between natural and singing voice synthesis approaches. In the second part we outline a four level model and a corpus-based methodology in modelling expressive forms—an essential step towards expressive voice synthesis. We t...
متن کاملClustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters
A great challenge for text-to-speech synthesis is to produce expressive speech. The main problem is that it is difficult to synthesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Feature Maps (SOFM) ar...
متن کاملA comparison of voice conversion methods for transforming voice quality in emotional speech synthesis
This paper presents a comparison of methods for transforming voice quality in neutral synthetic speech to match cheerful, aggressive, and depressed expressive styles. Neutral speech is generated using the unit selection system in the MARY TTS platform and a large neutral database in German. The output is modified using voice conversion techniques to match the target expressive styles, the focus...
متن کاملInterpolating Expressions in Unit Selection
In expressive speech synthesis, a key challenge is the generation of flexibly varying expressive tone while maintaining the high quality achieved with unit selection speech synthesis methods. Existing approaches have either concentrated on achieving high synthesis quality with no flexibility, or they have aimed at parametric models, requiring the use of parametric synthesis technologies such as...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2014 شماره
صفحات -
تاریخ انتشار 2014